Residue network in protein native structure belongs to 
the universality class of three dimensional critical percolation cluster 



Hidetoshi MoritaB and Mitsunori TakancQ 

Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan 

(Dated: September 28, 2008) 

A single protein molecule is regarded as a contact network of amino-acid residues. Some studies 
have indicated that this network is a small world network (SWN), while other results have implied 
that this is a fractal network (FN). However, SWN and FN are essentially different in the dependence 
of the shortest path length on the number of nodes. In this paper, we investigate this dependence 
in the residue contact networks of proteins in native structures, and show that the networks are 
not SWN but FN. FN is generally characterized by several dimensions. Among them, we focus 
on three dimensions; the network topological dimension Dc, the fractal dimension Df, and the 
spectral dimension Dg. We find that proteins universally yield Dc ~ 1.9, Df « 2.5 and ~ 1-3. 
These values are in surprisingly good coincidence with those in three dimensional critical percolation 
cluster. Hence the residue contact networks in the protein native structures belong to the universality 
class of three dimensional percolation cluster. The criticality is relevant to the ambivalent nature 
of the protein native structures, i.e., the coexistence of stability and instability, both of which are 
necessary for a protein to function as a molecular machine or an allosteric enzyme. 
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Introduction Proteins are one-dimensional chains of 
amino-acid residues embedded in three dimensional {D = 
3; 3D) Euclidean space. The residues neighboring in 
the Euclidean space are in contact with each other. 
Thus we can regard a protein molecule as a contact net- 
work of amino-acid residues [l], Q • This network view- 
point is complementary to the energy landscape picture 
[3| in understanding the general properties of proteins. 
We hereafter consider this network within single protein 
molecules in their native structures, in particular focus- 
ing on its universality among proteins. 

Some recent studies [4, 5, 6, have applied the lat- 
est network theory to the residue network, by regarding 
the amino-acid residues and their contacts as nodes and 
edges, respectively. The important quantities to charac- 
terize the network are the clustering coefficient C and the 
shortest path length L [§]. Those studies have demon- 
strated that in the residue networks C is larger than the 
random networks [loj while L is smaller than the normal 
lattice. This indicates that the residue network is a small 
world network (SWN) 

On the other hand, the spacial profile of residues within 
single protein molecules has long been studied with the 
use of authentic methods of material science. Earlier 
spectroscopic studies [13] have shown anomalous density 
of states. These results, accompanied with theoretical 
studies have suggested that the protein structures 
possess the property of fractal lattice. The fractality 
within single proteins has also been supported numeri- 
cally through the density of normal modes 14 , [3, 
and the spacial mass distribution 17[ . This implies that 
the residue network that we are interested in is a fractal 
network (FN). 

From the general viewpoint of the network theory, 
however, there lies a dichotomy between SWN and 



FN [l8|. The clustering coefficient C cannot discrimi- 
nate between SWN and FN, since in both the networks 
C have a larger value than the random networks. In con- 
trast, the dependence of the shortest path length L on 
the number of nodes N is essentially different between 
SWN and FN; L depends on N logarithmically and alge- 
braically, respectively. By exploiting the TV-dependency 
of L, we can differentiate SWN and FN, in principle. 

In proteins, nevertheless, it is practically difficult to 
clearly distinguish between these two iV-dependence. 
This is because the size of proteins does not distribute 
widely enough to cover sufficient decades. The same data 
sets can be read as a straight line both in log-log (SWN) 
and semi-log (FN) plot. 

To overcome this difficulty, here we introduce a more 
sophisticated method. Instead of the N-L plot among 
various sized proteins, we investigate an equivalent within 
single protein molecules; we calculate the number of 
nodes ni that can be reached until / path steps. Then, by 
overdrawing the ni-l plot for various sized proteins, we 
obtain a universal curve, as well as the deviation from it 
due to finite size effect. Thus we can discuss an asymp- 
totic behavior in the large N limit. We thereby find that 
network in protein native structures is FN, not SWN. 
This is the first result of this letter. 

We then obtain the three characteristic dimensions of 
fractal residue network; the network topological dimen- 
sion Dc, the fractal dimension Df, and the spectral di- 
mension Dg. The values of them are universal among 
single-chain proteins. Furthermore, these three values 
surprisingly coincide with those of the 3D critical perco- 
lation cluster. Namely, proteins belongs to the univer- 
sality class of 3D critical percolation cluster. This is the 
second and the most highlighted result of this letter. 
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FIG. 1: Averaged number of nodes n; that a walker on the 
network starting from a node can visit at least once until I 
steps; plotted in (a) log-log and (b) semi-log scales. 



Small world network vs fractal network First of all, 
we define the network in a protein native structure. We 
use the spacial information of the native structure in 
Protein Data Bank (PDB) [l^. We regard amino-acid 
residues as nodes; we represent them by Cq atoms, which 
is a standard way in coarse grained models [3], and is in- 
deed employed in the past network studies 0, 0, Hi- 
A pair of nodes, i and j, is considered to have an edge 
if their Euclidean distance, dij, is less than a cut-off dis- 
tance, dc- Then the network is characterized by the ad- 
jacent matrix: 



A, 



6(4 



(1) 



where &{■) is the Heaviside step function. Here we adopt 
dc — TA, which corresponds to the second coordination 
shell in the radial distribution function of Cq; we have 
also confirmed that the result below is robust to the 
choice of dc from 6 to lOA j20| . 

Let n[^^ be the number of nodes that a walker on 
the network starting from the node i can visit at least 
once until I steps. Since we are interested in the over- 
all network property of a protein, we consider its aver- 
age, ni = "•/'V^- As / becomes larger, ni mono- 
tonically increases and finally saturates at N. In the 
i'-dimensional normal lattice, ni ^ l^ . If the network is 
FN, similarly, the following scaling holds (18] : 
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where Dr is referred to as the network topological di- 
mension [2ll.l22|. If the network is SWN, in contrast, the 
relationship is [l8|. 



ni ~ exp(//?o) 



(3) 



for a positive constant Iq. Note again that the relation- 
ships and ([3]) are essentially different, leading to the 
dichotomy between FN and SWN [I^. 

FIG. [1] shows the relationship between ni and /; the 
same data sets are plotted in (a) log-log and (b) semi-log 



scales. We present the data for five representative pro- 
teins of different size: ribonuclease Tl (PDB ID=9RNT, 
104 amino acids (a. a.)), cutinase (ICUS, 200 a. a.), green 
fluorescent protein (lEMA, 236 a. a.), actin, (1J6Z, 375 
a. a.), and subfragment 1 of myosin (1SR6, 1152 a. a.). 
Obviously the data obeys the power-law scaling better 
than the exponential dependence. This is also supported 
by considering the finite size effect as follows. In (a), the 
range where the data follow the power-law scaling tends 
to extend as the number of nodes N increases. This sug- 
gests the existence of an asymptotic universal line in the 
limit of — > cxD. In (b), on the contrary, we cannot 
see such an asymptotic tendency. Thus we conclude that 
proteins universally obeys the power-law scaling ^ with 
Dc ~ 1.9. Hence the networks in protein native struc- 
tures are FN, not SWN. 

In much larger proteins, Dc often gives a bit larger 
value than 1.9, or even the scaling itself is smeared. 
This is because the larger proteins are usually not single- 
domain nor single-chain but multi-domain or multi-chain 
proteins. Even in such proteins, however, each single- 
domain or single-chain component still yields the same 
scaling law with the same dimension Dc ^ 1.9 20]. 

One plausible reason why the network is not SWN but 
FN is that the residues are spatially restricted in the 3D 
Euclidean space. Indeed, it has been suggested that net- 
works with spatial (geographical) restriction tend to be 
rather regular (including fractal) network than SWN [l^l • 

Fractal dimension In addition to the network topo- 
logical dimension Dc, FN is characterized by two other 
dimensions in general; the fractal dimension Df and the 
spectral dimension Ds [2l[. While these three dimen- 
sions and the Euclidean dimension D are identical in the 
normal lattice, they can be different in FN. 

The fractal dimension is determined from the spacial 
distribution of nodes. Here we again employ the method 
within single proteins, differently from the previous stud- 
ies frr], in order to discuss the asymptotic behavior in 
the limit N ^ oo. Let n^'^\d) be the number of nodes 
the distance of which from the node i is less than d; 
n''^\d) — Xj Q[d — dij). Since we are interested in the 
overall network property of a protein, we consider its av- 
erage, n{d) = X^ n'^')(c?)/A^, that is. 
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Note that this is nothing but the correlation integral in- 
troduced by Grassberger and Procaccia •23], although 
this is not normalized in order to consider the finite size 
effect. As d becomes larger, n{d) monotonically increases 
and finally saturates at A'^. In the D-dimensional normal 
lattice, n{d) ~ d^ . Similarly, if the spacial distribution 
of nodes is fractal, 



n{d) 



(5) 
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FIG. 2; Averaged number of residues n{d) the distance of 
which from a residue is less than d for the same proteins as 
FIG.[T] 
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FIG. 3: Density of normal modes p{uj) of (a) ribonuclease Tl 
(PDB ID=9RNT) and (b) cutinase (ICUS). Various bin sizes 
Au! are taken so as to display the master curve clearer. 



where Df is referred to as the fractal dimension. 

FIG.[2]shows n{d) versus d in log-log scale, for the same 
proteins as FIG. [TJ The relationship follows power-law 
scaling. Similarly to the case in the network topological 
dimension, this is supported by considering the finite size 
effect; the power-law scaling range tends to extend as the 
number of nodes increases, suggesting the existence of an 
asymptotic universal line in the limit N —f (x. Thus we 
conclude that proteins universally follows the power-law 
scaling ([5]) with the fractal dimension Df w 2.5, which is 
consistent with the previous studies [17)] . 

Spectral dimension The spectral dimension is deter- 
mined from the density of normal modes (DNM). Accord- 
ing to the Debye theory, DNM in Z?-dimensional normal 
lattice is p{u!) ^ uj-^^^. Similarly, DNM in FN obeys. 



p{uj) 



(6) 



where is referred to as the spectral dimension [22| . 

DNM is, in general, obtained experimentally by spec- 
troscopies and numerically by normal mode analysis 
(NMA). To be relevant to experiments, we conduct NMA 
in the all atom model, not in a coarse grained model. 
Then, by focusing on the frequency region corresponding 
to the residue-residue interaction, we consider the spec- 
tral dimension of the residue network. We do so because 
for NMA it is necessary to take the interaction strengths 
precisely into account. In the all atom model, the inter- 
action strengths are quite reliable, since it is basically ob- 
tained from quantum chemical calculations. In a coarse 
grained model, in contrast, the interaction strengths are 
introduced rather arbitrary. It is true that the coarse 
grained models well reproduces the overall fluctuation 
of the protein native structure Q. This is, however, 
largely due to the fact that only a limited number of low- 
est frequency normal modes (or largest amplitude prin- 
cipal components) dominate the fluctuation. There is no 
guarantee that they also reproduce DNM for decades. In- 
deed, it has been reported that there is an essential differ- 
ence in DNM between the all atom model and the coarse 



grained model with identical interaction strengths [24 1. 
Instead, here we coarse grain DNM itself, by truncating 
the higher frequency region. We perform NMA by us- 
ing the program NMODE implemented in the AMBER 
software [25[ , with AMBER force field (perm99) and im- 
plicit water (Generalized Born) model. Before NMA, en- 
ergy minimization is executed with Newton- Raphson and 
conjugate gradient method, so that the norm of the force 
is less than the order of 10^^^ kcal moP^A 

We have obtained DNM for several proteins, and FIG. 
[3]shows typical results; these are essentially similar to one 
of the previous numerical studies [l3|. There exist two 
shoulders at around 10 and 100 cm~^, which are denoted 
respectively by Ups and ujgl- The frequency higher than 
ujgl corresponds to local motions, due to covalent-bond 
stretching and angle bending motions. The frequency 
lower than logl, in contrast, corresponds to global mo- 
tions due to residue-residue interactions, which we are 
now interested in. In the latter region, DNM obeys the 
power-law scaling ^ with ~ 1.3. At around wfs, 
the dimension changes from 1.3 to 3.0. This is due to the 
finite size effect; through a long wave-length probe, the 
protein is regarded just as a 3D object. Indeed, similar 
change in slope due to the finite size effect is observed in 
percolation clusters ^25|. We expect that, in much larger 
proteins, ups shift towards the lower-frequency direc- 
tion, and accordingly the region of Dg ~ 1.3 becomes 
wider. Thus we conclude that residue-residue interaction 
in proteins universally follows the power-law scaling ([6]) 
with the spectral dimension w 1.3. 

We discuss the reason why some of the previous stud- 
ies lj,ll5| gave Ds larger than 1.3. In these studies, Dg 
was obtained not from DNM, i.e., the probability den- 
sity function p{uj), but from its cumulative distribution 
function 0(cj) = duj'p{u!'). Ds obtained from p{uj) 
is identical with that from il.{uj) if a single scaling holds 
over the whole range considered. In proteins, however, 
the scaling changes at around wps due to the finite size 
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effect. This accordingly gives an illusionary larger value 
of Ds ■ To illustrate this simply, we model the probabil- 
ity density function as a function that sharply change the 
scaling at LUps- 
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with a dimensionless positive constant C. Its cumulative 
distribution function is, 




{uj < LUFS) 
{uj > UJFs)- 



(8) 

The gradient of log to log uj gives a larger value than 
the correct spectral dimension Dg at around uj > ujfs- 
The gradient would yield Dg in the region uj/ujfs ^ 
(1 - Dg/Df/"'. In proteins, D = 3 and = 1-3, 
then uj/ujps ^ 0.56. This region, however, corresponds 
to the local motions, not to the global residue-residue 
interactions in which we have discovered the universality. 

Conclusion: universality class of 3D critical percola- 
tion cluster We have thus obtained the characteristic di- 
mensions of FN inherent in the protein native structures, 
{D,Dc,Df,Ds) = (3,1.9,2.5,1.3). Note that these di- 
mensions are in surprisingly good coincidence with those 
in the 3D critical percolation cluster, {D, Dc, Df, Dg) = 
(3, 1.885, 2.53, 1.3) [2lj. Hence we here propose that the 
protein native structures belong to the universality class 
of 3D critical percolation cluster. This is the main state- 
ment of this letter. 

Then why proteins as residue-contact networks are 
critically percolated? Although it is difhcult to give the 
complete answer in the present stage of this study, still we 
can provide a purposive explanation by pointing out two 
important aspects of proteins; stability and instability. 
On the one hand, proteins fold into their own (almost) 
unique native structures. Even when they are forced to 
unfold, they refold back into the native structures spon- 
taneously (often with help from molecular chaperons). 
In this sense, proteins are stable. On the other hand, 
proteins flexibly change their structures. The structural 
change is sometimes accompanied with even (partial) un- 
folding. In this sense, proteins are unstable. The coexis- 
tence of these two conflicting aspects is essential for the 
functions of proteins, in particular to work as molecu- 
lar machines or allosteric enzymes. Being in the critical 
state is sufficient for that. Furthermore, the criticality 
can be even necessary; proteins should evolve towards 



the critical state 26|, |27[ . This hypothesis should be ver- 
ified through the study on molecular evolution, which is 
a challenging subject in the future. 
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